Skip to content

3.6.0#345

Merged
n2iw merged 48 commits into
masterfrom
3.6.0
May 22, 2026
Merged

3.6.0#345
n2iw merged 48 commits into
masterfrom
3.6.0

Conversation

@n2iw
Copy link
Copy Markdown
Contributor

@n2iw n2iw commented May 22, 2026

No description provided.

wfy1997 and others added 30 commits December 16, 2025 11:47
Merge 3.5.1 into 3.6.0
Introduces a new PV puller v2 workflow with src/pv_puller_v2.py, updates configuration and constants to support the new service type, and adds MongoDao logic for upserting property PVs. Integrates the new puller into the validator controller, updates API client for batch data element retrieval, and extends MongoDB scripts and config handling for the new STS API endpoint.
Replaced references to 'CDE' with 'property' throughout pv_puller_v2.py to improve clarity and align with updated data model. Updated function names, variable names, log messages, and removed unused CDE-specific code.
…odel and property instead of cde code

Introduces insert_concept_codes_v2 in MongoDao to prevent duplicate concept code insertion and updates PVPullerV2 to use this new method. Also modifies compose_concept_code_record to use (model, property, value, concept_code) as the unique key for concept codes.
Renamed variables and log messages in mongo_dao.py for clarity, changing references from 'cde' to 'property'. Improved error handling in pv_puller_v2.py by raising an exception when no model is configured for property PV pulling.
Add sts v2 api support and property PV handling
Introduce support for STS v2 property-level permissible values and update codepaths to retrieve and persist them. Add new STS API v2 endpoint config (sts_api_one_url_v2) and update mongo DB script with the v2 URL template. Extend DataModel with get_model_version and get_data_commons. Update MongoDao to upsert/get property PVs by property/model/version and to lookup concept codes by property+model. Replace legacy PV puller usage with a new pv_puller_v2 that fetches property PVs (get_pv_by_property_version, get_all_pvs_by_version), processes results, and saves PVs, synonyms, and concept codes to DB. Update metadata validator to use the new PV retrieval flow (model/version-based lookup and STS v2 fallback) and adjust controller to call the v2 puller. Miscellaneous logging/message tweaks and small refactors to parameter names.
Add in-memory caches and refactor permissive-value checks to reduce DB calls and simplify logic.

- src/common/mongo_dao.py: add props and concept_codes dicts and cache results in get_property_permissible_values and get_concept_code_by_pv to avoid repeated Mongo queries.
- src/common/utils.py: add CDE_PERMISSIVE_VALUES import and new has_permissive_value(prop) helper to centralize permissive-value detection.
- src/metadata_validator.py: import and use has_permissive_value to replace repetitive permissive-value checks and cleanup related code paths.
- src/config.py: simplify STS resource assignments by using sts_resource.get(...) directly.

These changes aim to improve performance (fewer DB reads), reduce duplicated code, and make permissive-value handling clearer.
…, and use a single layer dict with concat keys

Migrate permissible-value handling from CDE-centric to property-centric (STS v2). Added PROPERTY_PERMISSIBLE_VALUES and PROPERTY_TERM constants; updated utils and metadata_validator to use property terms and a not_found_property flag. Refactored mongo DAO to store/fetch property PVs and concept codes using composite cache keys (model_version_property) and to use the new permissible-values field. Removed legacy pv_puller.py and updated pv_puller_v2.py to use property-focused naming, API endpoints and extraction logic. Adjusted imports (config/validator) accordingly.
Add STS v2 property PV support and retrieval to the metadata validator
Introduce a new pytest suite for pv_puller_v2 covering PVPullerV2 behavior and helper functions. Tests use fixtures and extensive mocking (unittest.mock) to validate initialization, pulling flow (pull_property_pv_synonym_concept_codes), API interactions (retrieveAllPropertyViaAPI, get_pv_by_property_version, get_all_pvs_by_version), processing logic (process_sts_property_pv, extract_pv_list), record composition (compose_property_record, compose_synonym_record, compose_concept_code_record), and the top-level pull_pv_lists_v2 orchestration. The suite includes success cases, edge/error handling (empty results, upsert failures, API exceptions, KeyboardInterrupt), and an integration-style test for end-to-end flow with mocked API and Mongo DAO.
- batched metadata validation
- added batched validation progress tracking in the validation document
- added validation status details in records
- added failed validation status to indicate bad inputs for validation

.
- Add `Validate Metadata Batch` SQS message type for processing records in backend-defined chunks instead of fetching all records internally
- Add `_process_metadata_batch` handler with per-batch error tracking and atomic finalization via `increment_completed_batches` (`$inc`, `$max`, `$push`)
- Add `get_dataRecords_by_ids` to fetch records by explicit `_id` list from batch messages
- Add guard filter (`metadataEnded: null`) and `$unset` of tracking fields to prevent double-finalization on message redelivery
- Extend `update_validation_status` with `status_detail`, `unset_fields`, and `guard_filter` parameters
- Extend `set_submission_validation_status` with `status_detail` for surfacing per-batch failure summaries
- Extract `_initialize_for_validation` from `MetaDataValidator.validate` to share setup between batch and non-batch paths
- Extract `_process_metadata_validation` and `_process_cross_submission` helpers from the main SQS loop
- Replace `FAILED` constant with `STATUS_ERROR` across metadata and cross-submission validators
- Fix `DATA_COLlECTION` typo to `DATA_COLLECTION`
- Add circular parent reference guard in `get_file_consent_code`
- Remove duplicate error/warning block in `validate_nodes`
- Add comprehensive batch validation test suite (`test_metadata_batch_validation.py`)
- Add `BATCHED_METADATA_VALIDATION_INTERFACE.md` documenting the SQS contract, DB schema, and error handling

Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve conflicts by accepting the simplified remote revision:
- Remove guard_filter from update_validation_status
- Default totalBatches to 1; simplify < 1 check
- Remove batch summary detail for successful-but-errored batches
- Always update submission status on last batch regardless of update_ok
- Simplify _process_metadata_validation (remove None submission guard)
- Remove tests for removed features (guard_filter, batch summary, missing totalBatches)

Co-authored-by: Cursor <cursoragent@cursor.com>
- Restored failed status for validation records
- include submission, validation, and batch ids in all applicable logs
- add unit testing coverage to ensure the failed status is not written to the submission record
- ensure that when the validation scope is New, the existing submission validation status is considered
Add comprehensive tests for pv_puller_v2
AustinSMueller and others added 18 commits March 16, 2026 16:55
- parse the deleteOrphanedDataFiles flag from the delete metadata SQS messages
- when data files are not deleted along with metadata, update the submission to add orphaned file errors
Fixed deleting one parent will delete all parents of the same type issue
Removed outdated unit tests.
Update Python base image to 3.14.4-alpine3.23
Fixed all fixable CVES
Comment on lines +14 to +34
name: Run tests and upload coverage
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
submodules: true
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install pytest-cov coveralls
- name: Run tests with coverage
run: |
pytest --cov=src --cov-report=xml --cov-report=term-missing --ignore=src/bento
- name: Coveralls GitHub Action
uses: coverallsapp/github-action@v2
@n2iw n2iw merged commit ca2a897 into master May 22, 2026
14 checks passed
@n2iw n2iw deleted the 3.6.0 branch May 22, 2026 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants